Spoken Language Synthesis: Experiments in Synthesis of Spontaneous Monologues

نویسندگان

Shiva Sundaram

Shrikanth Narayanan

چکیده

While TTS technology has come a long way, there is an ongoing need for bringing improved “naturalness” to synthesized speech. One predominant aspect of natural, spontaneous speech is the variability in it along several dimensions -in terms of vocabulary, prosodic features, paralinguistic elements and discourse markers. Such variability is typically carefully avoided or minimized in conventional text to speech for the sake of high intelligibility. However, in applications requiring immersive anthropomorphic humanmachine interfaces, including those with computer-generated avatars, there is a great desire to mimic human-like synthesized speech output. In this paper we investigate methods and the usefulness of incorporating certain features characterizing fluent natural speech for increasing “naturalness” in synthesized speech. We propose a data driven approach for modeling both speaker-independent and speaker-dependent spontaneous speech features at the lexical and acoustic levels (so-called, VoiceFonts). This method has the potential to create unique, custom speaking styles of a target speaker. A simple limited domain synthesizer was built based on this idea using data from a classroom lecture and was used to synthesize 28 target utterances. Results from preliminary listening experiments by 19 volunteers showed that such an approach indeed improves naturalness, without significant loss in intelligibility, beyond the limitations of the underlying waveform synthesis. For example, subjects could correctly identify natural speech with a probability of 0.6 and confused the clips synthesized in this work with natural speech with a probability of 0.27 in a 4-way choice listening test.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discourse Structure in Spoken Language: Studies on Speech Corpora

A better understanding of the intonational charaeteristics of spoken discourse may lead to new empirical techniques for identifying discourse structure from speech, as well as new algorithms for enhancing the naturalness of synthetic speech. This paper summarizes results of pilot studies that demonstrate reliable correlations of discourse and speech properties, and reports findings on a new cor...

متن کامل

A generic algorithm for generating spoken monologues

The defining property of a Concept-to-Speech system is that it combines language and speech generation. Language generation converts the input concepts into natural language, which speech generation subsequently transforms into speech. Potentially, this leads to a more ‘natural sounding’ output than can be achieved in a plain Text-to-Speech system, since the correct placement of pitch accents a...

متن کامل

Voiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's disease

Several studies have addressed the automatic classification of speakers with Parkinson’s disease (PD) and healthy controls (HC). Most of the studies are based on speech recordings of sustained vowels, isolated words, and single sentences. Only few investigations have considered read texts and/or spontaneous speech. This paper addresses two main questions still open regarding the automatic analy...

متن کامل

Parsing Spoken Language without Syntax : a Microsemantic Approach

Parsing spontaneous speech is a difficult task because of the ungrammatical nature of most spoken utterances. To overpass this problem, we propose in this paper to handle the spoken language without considering syntax. We describe thus a microsemantic parser which is uniquely based on an associative network of semantic priming. Experimental results on spontaneous speech show that this parser st...

متن کامل

Some experiments in the Czech spontaneous speech recognition domain

A spoken/dialog interpretation system is proposed, using prosodic information systematically at all processing stages. A prosody modul is used for parsing, dialog understanding, translation, generation and speech synthesis. 1

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Spoken Language Synthesis: Experiments in Synthesis of Spontaneous Monologues

نویسندگان

چکیده

منابع مشابه

Discourse Structure in Spoken Language: Studies on Speech Corpora

A generic algorithm for generating spoken monologues

Voiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's disease

Parsing Spoken Language without Syntax : a Microsemantic Approach

Some experiments in the Czech spontaneous speech recognition domain

عنوان ژورنال:

اشتراک گذاری